10 research outputs found

    Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation

    Full text link
    Generative models for 3D geometric data arise in many important applications in 3D computer vision and graphics. In this paper, we focus on 3D deformable shapes that share a common topological structure, such as human faces and bodies. Morphable Models and their variants, despite their linear formulation, have been widely used for shape representation, while most of the recently proposed nonlinear approaches resort to intermediate representations, such as 3D voxel grids or 2D views. In this work, we introduce a novel graph convolutional operator, acting directly on the 3D mesh, that explicitly models the inductive bias of the fixed underlying graph. This is achieved by enforcing consistent local orderings of the vertices of the graph, through the spiral operator, thus breaking the permutation invariance property that is adopted by all the prior work on Graph Neural Networks. Our operator comes by construction with desirable properties (anisotropic, topology-aware, lightweight, easy-to-optimise), and by using it as a building block for traditional deep generative architectures, we demonstrate state-of-the-art results on a variety of 3D shape datasets compared to the linear Morphable Model and other graph convolutional operators.Comment: to appear at ICCV 201

    Dynamic Neural Portraits

    Full text link
    We present Dynamic Neural Portraits, a novel approach to the problem of full-head reenactment. Our method generates photo-realistic video portraits by explicitly controlling head pose, facial expressions and eye gaze. Our proposed architecture is different from existing methods that rely on GAN-based image-to-image translation networks for transforming renderings of 3D faces into photo-realistic images. Instead, we build our system upon a 2D coordinate-based MLP with controllable dynamics. Our intuition to adopt a 2D-based representation, as opposed to recent 3D NeRF-like systems, stems from the fact that video portraits are captured by monocular stationary cameras, therefore, only a single viewpoint of the scene is available. Primarily, we condition our generative model on expression blendshapes, nonetheless, we show that our system can be successfully driven by audio features as well. Our experiments demonstrate that the proposed method is 270 times faster than recent NeRF-based reenactment methods, with our networks achieving speeds of 24 fps for resolutions up to 1024 x 1024, while outperforming prior works in terms of visual quality.Comment: In IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 202

    3D face morphable models "In-The-Wild"

    Get PDF
    3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions (in-the-wild). In this paper, we propose the first, to the best of our knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in-the-wild texture model. We show that the employment of such an in-the-wild texture model greatly simplifies the fitting procedure, because there is no need to optimise with regards to the illumination parameters. Furthermore, we propose a new fast algorithm for fitting the 3DMM in arbitrary images. Finally, we have captured the first 3D facial database with relatively unconstrained conditions and report quantitative evaluations with state-of-the-art performance. Complementary qualitative reconstruction results are demonstrated on standard in-the-wild facial databases

    GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction

    Get PDF
    In the past few years, a lot of work has been done to- wards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modeling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details

    Towards a complete 3D morphable model of the human head

    Get PDF
    Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for representing the 3D shapes and textures of an object class. Here we present the most complete 3DMM of the human head to date that includes face, cranium, ears, eyes, teeth and tongue. To achieve this, we propose two methods for combining existing 3DMMs of different overlapping head parts: i. use a regressor to complete missing parts of one model using the other, ii. use the Gaussian Process framework to blend covariance matrices from multiple models. Thus we build a new combined face-and-head shape model that blends the variability and facial detail of an existing face model (the LSFM) with the full head modelling capability of an existing head model (the LYHM). Then we construct and fuse a highly-detailed ear model to extend the variation of the ear shape. Eye and eye region models are incorporated into the head model, along with basic models of the teeth, tongue and inner mouth cavity. The new model achieves state-of-the-art performance. We use our model to reconstruct full head representations from single, unconstrained images allowing us to parameterize craniofacial shape and texture, along with the ear shape, eye gaze and eye color.Comment: 18 pages, 18 figures, submitted to Transactions on Pattern Analysis and Machine Intelligence (TPAMI) on the 9th of October as an extension paper of the original oral CVPR paper : arXiv:1903.0378

    3D head morphable models and beyond: algorithms and applications

    Get PDF
    It has been more than 20 year since the introduction of 3D morphable models (3DMM) in the computer vision literature. They were proposed as a face representation based on principal components analysis for the task of image analysis, photorealist-manipulation, and 3D reconstruction from single images. Even so, to this date, the applications of such models are limited by a number of factors. Firstly, training correctly 3DMMs require a vast amount of 3D data that most of the times are not publicly available to the research community due to increasingly stringent data protection regulations. Hence, it is extremely difficult to combine and enrich multiple attributes of the human face/head without the initial 3D images. Additionally, many 3DMMs utilize different templates that describe distinct parts of the human face/head (\ie~face, cranium, ears, eyes) that partly overlap with each other and capture statistical variations which are extremely difficult to incorporate into one single universal morphable model. Moreover, despite the increasing level of detail in the 3D face reconstruction from in-the-wild images, mainly attributed to recent advancements in deep learning, non of the available methods in the literature deal with the human tongue which is important for speech dynamics and improves the realness of the oral cavity. Finally, there is limited work on 3D facial geometric enchantments and translations from different capturing systems due to extremely limited availability of 3D dasasets tailored for this task. This thesis aims at tackling these shortcomings in all four domains. A novel approach on how to combine and enrich existing 3DMMs without the underline raw data is proposed. We introduce two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use a Gaussian Process framework to blend covariance matrices from multiple models. We show case our approach by combining existing face and head 3DMMs with different templates and statistical variations. Furthermore, we introduce to the research community the first Universal Head Model (UHM) which holds important statistical variation across all key structures of the human head that have an important contribution to to the appearance and identity of a person. We later show case how this model is used to create full head appearances from single in-the-wild images, thus making significant improvements toward the step of realist human head digitization from data-deficient sources. Additionally, we present the first method that accurately reconstructs the human tongue from single images by utilizing a novel generative framework which models directly the highly deformable surface of the human tongue and seamlessly merges it with our universal head model for more realist representations of the oral cavity dynamics. Lastly, in this thesis, it is presented a novel generative pipeline capable of converting and enhancing low to high quality 3D facial scans. This will potentially aid depth sensor applications by increasing the quality of the output data while maintaining a low cost. It is also shown that the proposed framework can be extended to handle translations between various expressions on demand.Open Acces

    FitMe: Deep Photorealistic 3D Morphable Model Avatars

    Full text link
    In this paper, we introduce FitMe, a facial reflectance model and a differentiable rendering optimization pipeline, that can be used to acquire high-fidelity renderable human avatars from single or multiple images. The model consists of a multi-modal style-based generator, that captures facial appearance in terms of diffuse and specular reflectance, and a PCA-based shape model. We employ a fast differentiable rendering process that can be used in an optimization pipeline, while also achieving photorealistic facial shading. Our optimization process accurately captures both the facial reflectance and shape in high-detail, by exploiting the expressivity of the style-based latent representation and of our shape model. FitMe achieves state-of-the-art reflectance acquisition and identity preservation on single "in-the-wild" facial images, while it produces impressive scan-like results, when given multiple unconstrained facial images pertaining to the same identity. In contrast with recent implicit avatar reconstructions, FitMe requires only one minute and produces relightable mesh and texture-based avatars, that can be used by end-user applications.Comment: Accepted at CVPR 2023, project page at https://lattas.github.io/fitme , 17 pages including supplementary materia
    corecore